Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer

نویسندگان

چکیده

Real-world recognition system often encounters the challenge of unseen labels. To identify such labels, multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding (e.g., GloVe). However, methods only exploit single-modal from language model, while ignoring rich semantic information inherent in image-text pairs. Instead, recently developed open-vocabulary (OV) based succeed exploiting pairs object detection, and achieve impressive performance. Inspired success OV-based methods, we propose novel framework, named multi-modal transfer (MKT), for classification. Specifically, our method exploits vision pre-training (VLP) model. facilitate matching ability VLP distillation is employed to guarantee consistency image embeddings, along with prompt tuning further update embeddings. enable multiple objects, simple but effective two-stream module capture both local global features. Extensive experimental results show that significantly outperforms state-of-the-art public benchmark datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

Air pollution prediction via multi-label classification

A Bayesian network classifier can be used to estimate the probability of an air pollutant overcoming a certain threshold. Yet multiple predictions are typically required regarding variables which are stochastically dependent, such as ozone measured in multiple stations or assessed according to by different indicators. The common practice (independent approach) is to devise an independent classi...

متن کامل

Multi-modal Semantic Place Classification Multi-modal Semantic Place Classification

The ability to represent knowledge about space and its position therein is crucial for a mobile robot. To this end, topological and semantic descriptions are gaining popularity for augmenting purely metric space representations. In this paper we present a multi-modal place classification system that allows a mobile robot to identify places and recognize semantic categories in an indoor environm...

متن کامل

Multi-Objective Multi-Label Classification

Multi-label classification refers to the task of predicting potentially multiple labels for a given instance. Conventional multi-label classification approaches focus on the single objective setting, where the learning algorithm optimizes over a single performance criterion (e.g. Ranking Loss) or a heuristic function. The basic assumption is that the optimization over one single objective can i...

متن کامل

Multi-label Classification via Feature-aware Implicit Label Space Encoding

To tackle a multi-label classification problem with many classes, recently label space dimension reduction (LSDR) is proposed. It encodes the original label space to a low-dimensional latent space and uses a decoding process for recovery. In this paper, we propose a novel method termed FaIE to perform LSDR via Feature-aware Implicit label space Encoding. Unlike most previous work, the proposed ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i1.25159